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Abstract 

We consider the general model of zero-sum repeated games (or stochastic games with 
signals), and assume that one of the players is fully informed and controls the transi- 
tions of the state variable. We prove the existence of the uniform value, generalizing 
several results of the literature. A preliminary existence result is obtained for a certain 
class of stochastic games played with pure strategies. 

Key words. Repeated games, stochastic games, uniform value, incomplete information, 
single controller, Choquet order, Wasserstein distance. 

1 Introduction 

The context of this work is the characterization of repeated game models 
where the value exists. We first consider here general repeated games defined 
with finite sets of states, actions and signals. They contain usual stochastic games, 
standard repeated games with incomplete information and also repeated games 
with signals. At each stage the players will play a matrix game depending on 
a parameter called state, this state is partially known and evolves from stage to 
stage, and after each stage every player receives some private signal on the current 
situation. We make two important hypotheses. We first assume that player 1 is 
informed, in the sense that he can always deduce the current state and player 
2's signal from his own signal. Secondly, we assume that player 1 controls the 
transitions, in the sense that the law of the couple (new state, signal received by 
player 2) does not depend on player 2's actions. We call "repeated games with 
an informed controller" the games satisfying these two hypotheses. 

This class of games contains Markov chain repeated games with lack of in- 
formation on one side as studied in Renault, 2006, and is more general since, in 
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particular, we allow here for transitions of the state depending also on player l's 
actions. It also contains stochastic games with a single controller and incomplete 
information on the side of his opponent, as studied in Rosenberg et ai, 2004 (see 
subsection 13.3.31 here). And a fortiori it contains the standard repeated games 
with incomplete information on one side and perfect monitoring introduced by 
Aumann and Maschler. Notice that repeated games with an informed control- 
ler contains (weak) forms of the three main aspects of general repeated games 
models : the stochastic aspect (the state evolves from one stage to another and 
is controlled here by player 1), the incomplete information aspect (player 2 has 
an incomplete knowledge of the state), and the signalling aspect (players observe 
signals rather than actions). And we believe that the existence result presented 
here is the first one to significantly deal with these three aspects simultaneously. 
On the contrary, they do not contain stochastic games, where the transitions are 
controlled by both players (see Mertens Neyman 1981 for the existence of the 
uniform value in such games). 

We prove the existence of the uniform value via several steps, and several 
games are considered. These are : our original repeated game where player 1 is 
informed and controls the transitions (level 1), an auxiliary stochastic game (level 
2), and finally a one-player repeated game, i.e. a dynamic programming problem 
(level 3). A crucial point is that in our original game, the set of states K is finite. 

The auxiliary stochastic game has the following features. It is played with pure 
strategies. The new set of states is X = A(K), the set of probabilities over K, 
and represents in the original game the belief of player 2 on the current stat^j]. In 
the auxiliary stochastic game, the new state is known to both players, and actions 
played are perfectly observed after each stage. It is also convenient to consider 
states in Af(X), the set of probabilities with finite support on X. To express an 
informational gap, we use the Choquet order of sweeping of probabilities : given 
u and v in A/(X), we say that u is better than v, or v is a sweeping of u, if for 
every concave continuous mapping / from X to the reals, u(f) > v(f). And it 
is essentially possible to model the original informational advantage of player 1 
with a "splitting hypothesis" defined via this order. For the topological part, we 
use the weak* topology on the set A(X) of Borel probabilities on X, and more 
precisely the Wasserstein distance. This allows to carefully control the Lipschitz 
constant of the value functions. 

A dynamic programming problem can be derived from the auxiliary stochastic 
game. The role of player 2 disappears, and the set of states of the dynamic 
programming problem is Z = Af(X) x [0,1]. Af(X) is dense in A(X) for the 
weak* topology, so Z can be viewed as a precompact metric space. We define, for 
every m and n, a value u> m , n as the supremum payoff player 1 can achieve when 
his payoff is defined as the minimum, for t in {1, ..,n}, of his average rewards 
computed between stages m + 1 and m + t. It is possible to prove that the family 
{w m ,n) is uniformly equicontinuous, and together with the precompacity of the 

The idea of considering an auxiliary stochastic game is certainly not new, see for example 
Mertens 1986, Coulomb, 2003 or Mertens et ai, 1994, Part A, Ch IV, section 3. 
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state space this implies the existence of the uniform value for the the dynamic 
programming problem. The proof of this implication can be found in a companion 
paper (see Corollary 3.8, Renault 2007), which only deals with 1-player games 
and can be read independently. 

The present paper is organized as follows. In section [2], we consider a particular 
class of stochastic games including our auxiliary games of level 2. We think this 
class of games may be interesting in itself. It is defined with hypotheses making 
no reference to the original finite set K, and we prefer to start by presenting this 
class, which can be considered as both more general and simpler to study than 
the auxiliary stochastic games. We prove that these games have a uniform value 
using the result on dynamic programming proved in Renault, 2007. In section 
[31 we consider our original repeated game and show how the existence of the 
uniform value in these games is implied by the existence of the uniform value 
for the stochastic games of section [2j Finally, we obtain formulas expressing the 
uniform value in terms of the values of some finite games. More precisely, let 
Vm,n be the value of the game where the global payoff is defined as the average 
of the payoffs between stage m + 1 and stage m + n. We show in particular 
that inf n > 1 sup m>0 f min = sup m>0 inf n > 1 f mn , and this is the uniform value (see 
subsection 13.3.11) . We conclude by discussing several hypotheses and present a 
few open problems. 

2 A certain class of stochastic games 
2.1 Model 

We consider in this section 2-player zero-sum stochastic games with complete 
information and standard observation, played with pure strategies. We assume 
that after each stage, the new state is selected according to a probability with 
finite support. 

If X is a non empty set, we denote by Af(X) the set of probabilities on X 
with finite support. 
We consider : 

• three non empty sets : a set of states X, a set A of actions for player 
1, and a set B of actions for player 2, 

• an element u in A/(X), called the initial distribution on states, 

• a mapping g from X x A x B to [0, 1], called the payoff function of 
player 1, and 

• a mapping / from Xx Ax B to Af(X), called the transition function. 

The interpretation is the following. The initial state p\ in X is selected according 
to u, and is announced to both players. Then simultaneously, player 1 chooses 
ai in A, and player 2 chooses b\ in B. The stage payoff is g(pi, a±, bi) for player 
1, and — g(pi, ai, &i) for player 2, then a% and b% are publicly observed, and a 
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new state p 2 is selected according to l(pi,a>i,bi), etc... At any stage t > 2, the 
state ^ is selected according to l(pt-i, a t~i, h-i), and announced to both players. 
Simultaneously, player 1 chooses a t in A and player 2 chooses b t in _B. The stage 
payoffs are g(pt, a t , b t ) for player 1 and the opposite for player 2. Then a t and b t 
are publicly announced, and the play proceeds to stage t + 1. 

From now on we fix T = (X, A, B, g, /), and for every u in Aj(X) we denote 
by T(-u) = (X, A, B, g,l,u) the corresponding stochastic game induced by u. For 
the moment we make no assumption on T. We start with elementary definitions 
and notations. 

A strategy for player 1 is a sequence a = (cr n )n>i, where for each n, o n 
is a mapping from (X x A x B) n ~ l x X to A, with the interpretation that 
<7„(pi, ai, &i, ...,p n _i, a n _i, b n -i,p n ) is the action played by player 1 at stage n 
after (pi, ai, &i, ...,p n _i, a n _i, b n -i,p n ) occurred. <ti is just a mapping from X to 
A giving the first action played by player 1 depending on the initial state. Simi- 
larly, a strategy for player 2 is a sequence r = (r„) n >i, where for each n, r n is 
a mapping from (X x A x B) n ~ l x X to B. We denote by E and T the sets of 
strategies of player 1 and player 2, respectively. 

Fix for a while (w, a, r), and assume that player 1 plays a whereas player 2 
plays r in the game r(-u). The initial state p\ is selected according to u, then 
the first actions are a\ = o"i(pi) and b\ = ri(pi). p2 is selected according to 
l(p 1 ,a 1 ,bi), then a 2 = cr 2 (pi, ai, &i, P2), &2 = T 2 (pi, ai, 61,^2), etc... By induction 
this defines, for every positive N, a probability with finite support on the set 
(X x A x B) N corresponding to the set of the first N states and actions. It is 
standard that these probabilities can be uniquely extended to a probability 1P u ^,t 
on the set of plays Q = (X x A x B)°°, endowed with the a- algebra generated by 
the cylinders (one can apply, e.g., theorem 2.7.2. p. 109 in Ash, 1972). 

Definition 2.1. The average expected payoff' for player 1 induced by (o~,r) at the 
first N stages in the game T(u) is denoted by : 

1n(o, T ) = E lPu,,,r Yl9iPn, On, b n ) j . 

Definition 2.2. For u in A/(X) and N > 1, the game Tn(u) is the zero-sum 
game with normal form (E, T, 7^). 

Tn(u) is called the X-stage game with initial distribution u, and corresponds to 
the one-shot game where player l's strategy set is E, player 2's strategy set is T, 
and 7]^ is the payoff function for player 1. It has a value if : sup crgE inf Te r7]v(o", t) = 
inf Tg 7-sup crgE 7^((T, r). A strategy a, if any, achieving the supremum on the LHS 
is then called an optimal strategy for player 1. Similarly, a strategy r, if any, 
achieving the infimum on the RHS is then called an optimal strategy for player 
2. 
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Notations 2.3. 

For p in X , we denote by 5 P G Af(X) the Dirac measure on p. A probability u in 
Af(X) is written u = Xlpex u (p)$p> where u{p) is the probability of p under u. 

For u in Af(X), ifrjy(u) has a value, we denote it by vn(u) G [0,1]. For p in 
X, ifF]sf(8p) has a value, we denote it by vn(p) and we have vn{p) = vn(6 p ). 

Notice that if p ^ p', then 5i/2 P +i/2 P ' 7^ l/2 5 p + 1/2 5 p i, so we will not identify a 
state p with the measure 8 P . When the value of the iV-stage game exists for every 
initial distribution, vn is a mapping from Af(X) to M, whereas Vn is a mapping 
from X to R. It is easy to see that : Vm G A f (X),VN > 1, Vcr G E, Vr G T, 

JPu,a,r = ^2u{p)P Sp ,a,r and 7^(<t, r) = ^w(p)7^((7,r). 

Claim 2.4. If v N (p) exists for each p in X, then vn(u) exists for every u in 
A f (X) andv N (u) = Y. P ex u(p)v N (p). 

We now consider an infinite time horizon. 

Definition 2.5. Let u be in Af(X). 

The lower (or maxmin) value ofY{u) is : 

v(u) = sup aeE liminf (inf TeT 7^(<T, r)) . 

n 

The upper (or minmax) value ofT{u) is : 

v(u) = inf Ter limsup (sup aeS 7^((T, r)) . 

n 

v(u) < v(u). r(u) is said to have a uniform value if and only if v(u) = v{u), and 
in this case the uniform value is v(u) = v{u). 

An equivalent definition of the uniform value is as follows. Given a real number 
v, we say that player 1 can guarantee v in T(u) if : Ve > 0, 3a G S, 3N , VN > 
No, Vr G T, 7^r(o", r) > v — e. Player 2 can guarantee v in T(u) if : Ve > 0, 3r G 
T, 3N , VN > N , Vcr G E, 7^(<r, r) < i>+£. If player 1 can guarantee i> and player 
2 can guarantee w then clearly w > v. We also have : 

Claim 2.6. t>(w) = max{t> G iR, player 1 can guarantee v in r(u) }, 
v(u) = min{f G M, player 2 can guarantee v in r(u) }. 
^4 real number v can be guaranteed by both players if and only if v is the 
uniform value ofT(u). 

Assume now that Vn{u) exists for each N. If player 1 (resp. player 2) can 
guarantee v then liminf7v'5Ar(w) > v (resp. limsup^ vn(u) < v). As a consequence 
we have : 
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Claim 2.7. Assume that vn^) exists for each N. 

v(u) < liminf Vn(u) < limsup-UAr('u) <v(u). 

N N 

So in this case, the existence of the uniform value v(u) = v(u) implies the exis- 
tence of the "limit value" \imNVN(u), and all notions coincide : v(u) = v{u) = 
\im N v N (u). 

2.2 An existence result for the uniform value 

We are interested in the existence of the uniform value, and are now going to 
make some hypotheses on F. 

Remark 2.8. We have in mind, in view of application in section^ the case of 
a repeated game with lack of information on one side where player 1 is informed 
and controls the transition. In these games, there is an underlying finite set of 
parameters K , and X is the set of probabilities over K . Initially, a parameter 
k is selected according to p, and is announced to player 1 only. Then the para- 
meter may change from stage to stage, but it is always known by player 1 and 
its evolution is independent of player 2's actions. It will be possible to check the 
following hypotheses HI to HI in this model. Our point here is to be more general 
and simpler. We want to be able to write a model without any reference to the 
underlying set of parameters, and where players use pure strategies. 

We first make the very important assumption that player 1 only controls the 
transitions : 

Hypothesis HI : the transition I does not depend on player 2's actions, i.e. 
V P exyae A,Vb e B,W e B,l(p,a,b) = l(p,a,b'). 

In the sequel we consider I as a mapping from X x A to Af(X), and we write 
l(p, a) for the distribution on the next state if the actual state is p and player 1 
plays a. 

Hypothesis H2 : X is a compact convex subset of a normed vector space. 

We denote by A(X) the set of Borel probability measures on X, and we will 
use on A(X) the weak* topology and the Choquet order. A/(X) is now seen as a 
subset of A(X). We first fix notations and recall some definitions. We start with 
the topological aspect : X is in particular a compact metric space, and we denote 
by d(p, q) the distance between two elements p and q of X. 

Notations 2.9. We denote by E the set of continuous mappings from X to the 
reals, and by E\ the set of non expansive (i.e. Lipschitz with constant 1) elements 

of E. For u in A(X) and f in E we write u(f) = / f(p)du(p). Given f in E, 

Jpex 

we extend f by duality to an affine mapping f : A(X) — > M by f(u) = u(f). 
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In the following, A(X) will always be endowed with the weak* topology : a 
sequence (u n ) n converges to u in A(X) if and only if u n (f) — >n->oo u (f) f° r 
every / in E. A(X) is itself compact, the weak* topology can be metrized, and 
the set Af(X) of probabilities on X with finite support is dense in A(X) (see for 
example Doob, 1994, Ch.VIII, section 5, and Malliavin, 1995, p. 99). 

Remark 2.10. An important distance on A(X) which metrizes the weak* topo- 
logy is the following (Fortet-Mourier-)Wasserstein distance, defined by : 

Wu e A(X),Vv e ApT), d(u,v) = sup /eJ5 >(/) - v(f)\. 

One can check that this distance has the following nice properties. For every p, 
q in X, d(p,q) = d(5 p ,5 q ). Moreover, for f in E and C > 0, f is C -Lipschitz if 
and only if f is C -Lipschitz. 

We will also use the convexity of X. In zero-sum games with lack of informa- 
tion on the side of player 2, it is well known that the value is a concave function of 
the parameter p : this fundamental property represents the advantage for player 1 
to be informed (see for example, Sorin 2002, proposition 2.2 p. 16). In our setup, 
we want the initial distribution 5i/ 2p +i/2p' to be more advantageous for player 1 
than the initial distribution 1/2 5 P + 1/2 5 P >. This is perfectly represented by the 
following order : 

Definition 2.11. For u and v in A(X), we say that u is better than v, or that v 
is a sweeping of u, and we write u >z v, if : 

for every concave mapping f in E, u(f) > v(f). 

This order was introduced by Choquet|(1960). It is actually an order on A(X), 
the maximal elements are the Dirac measures, and Choquet proved that the 
minimal elements are the measures with support in the set of extreme points of 
X (see P. A. Meyer, 1966, theorem 24 p. 282). For every / in E, we easily have 
the equivalence : 

Claim 2.12. / is concave if and only if f is non decreasing. 
We now define hypotheses H3 to H7. 

Hypothesis H3 : A and B are compact convex subsets of topological vector 
spaces. 

Hypothesis H4 : For every (p,b) in X x B, (a — ► g(p,a,b)) is concave and 
upper semi-continuous. For every (p, a) in X x A, {b — ► g(p, a, b)) is convex and 
lower semi-continuous. 

We will prove in the sequel a natural dynamic programming principle a la 
Shapley (or Bellmann). 

2 For convenience, we reverse here Choquet's order, i.e. we write u y v instead of u ^ v. For 
u and v in Ay(Jf), a simple characterization of u >z v will be stated later, see proposition ^. 331 
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Notation 2.13. For f in E and a in [0, 1], we define /) : X — > M with : 

Vp G X, $0, /)(p) = sup aeA inf feei j ( a g(p, a, b) + (1 - a) f(l(p, a)) ) . 

Hypothesis H5 : There exists a subset of E\ containing $(1,0), and such 
that /) GP for every finV and a in [0, 1]. 

Hypothesis H6 : For every (p,b) in X x B, (a i— > l(p,a,b)) is continuous and 
concave. 

Hypothesis H7 : "Splitting" Consider a convex combination p = ^f =1 -^sPs m 
the set of states X, and a family of actions (a s ) s& s m -4 ■ Then there exists a in 
A such that : 

Z(p, a) h ^ ^l(P s ' and ini beBg(p, a,b) KinheBg(p s , a s , b). 

ses ses 

H3 and H4 are standard and, by Sion's theorem, will lead to the existence of 
the value of the stage game. H5 is very important and will ensure that all value 
functions are 1-Lipschitz. We will provide later a simple condition implying H5, 
see remark [2.421 . H6 is the only hypothesis where the topology on A(X) appears, 
and does not depend on a particular distance metrizing the weak* topology. H7 is 
the generalization of the well known splitting lemma for games with lack of infor- 
mation on one side. Under the hypotheses HI,..., H7, our main result in theorem 
12.161 will be the existence of the uniform value. We will also obtain several other 
properties, which will be expressed via the following notions. 



Definition 2.14. A strategy o = (at)t>i of player 1 is Markov if for each stage t, 
a t only depends on the current state pt- A Markov strategy for player 1 will be seen 
as a sequence a = (ot)t>i, where for each t o t is a mapping from X to A giving the 
action to be played on stage t depending on the current state. We denote the set 
of Markov strategies for player 1 by : S M = {a = (<Jt)t>i, with Vt, er t : X — ► A}. 
Markov strategies for player 2 are defined similarly. 

Definition 2.15. For m > and n > 1, the average expected payoff for player 
1 induced by a strategy pair (er, r) in E x T from stage m + 1 to stage m + n is 
denoted by : 

- Yl 9(pt, at, h) ) . 

t=m+l J 

Theorem 2.16. Assume that HI,..., H7 hold. 

Then for every initial distribution u, the game T(u) has a uniform value v*(u). 
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Every player can guarantee v*(u) with a Markov strategy : \/e > 0,3a G 
E M ,3r G T M ,3iV ,ViV > A^W G E,Vr' G T, 7^(a,r') > u-e and 7ft (cr', r) < 
v + e. 

We have : v*(u) = inf n >isup m > ?) mi „(u) = sup m > inf n >i{; m>n (u) 
= inf n > 1 sup m > Q w min («) = sup m > inf n >iw mjn (M), 

wherev m)n {u) = sup CTeE inf TeT 7m,n( '> r ) = inf Ter sup CTeS 7£ in (a, r), andw min (u) = 
sup aGS inf reT min te{1> ... jn} j^ t (a, r). 

For every m and n, v m ,n and w m , n are non expansive, and (v n ) n uniformly 
converges to v* . 

2.3 Proof of theorem I2TT61 

We now prove theorem 12. 161 and assume that HI,..., H7 hold. In the proof, we 
endow A(X) with the Wasserstein distance. We denote by IN* the set of positive 
integers. By H3 and H4, for every (p, a) G X x A, the infimum is achieved in 
inffo gj Bg(p, a, b), and we will simply write g(p, a) for min^E g(p, a, h). For each p, 
(a — > g(p, a)) still is concave and upper semi-continuous. 

Lemma 2.17. For every concave f in E and a in [0, 1], /) is concave. 

Proof : Fix a convex combination p = X]f=i ^sPs m X, and consider for each s 
an element a s in A. By the splitting hypothesis H7, one can find a in A such that 
l(p, a) >: J2seS V(p«, a s ) and g(p, a) > Y, seS ^sg(p s , a,), f is concave so / is 
non decreasing and f(l(p,a)) > ^2 seS A s /(/(p s , a s )). We obtain : 

(?) > a^2\ s g(p s ,a s ) + (1 - a)^2\ s f(l(p s ,a s )), 

ses ses 

= ^\s (ocg{p s , a s ) + (1 - a)f(l(p s , a,)) J . 
ses 

This holds for every {a s ) seS , so /)(p) > £] seS A s $(a, /)(?*)■ □ 

2.3.1 Value of finite games and the recursive formula. 

Lemma 2.18. For every state p in X, the game Ti(S p ) has a value which is : 

vAp) = maxmin g(p, a, b) = minmaxgQo, a, b) = $(l,0)(p). 

aeA beB beB aeA 

v\ is concave and belongs toT>. v% is non decreasing and non expansive. 

Proof : Fix p in X, and consider the game with normal form (A, B, g(p, ., .)). 
By H3 and H4, we can apply Sion's theorem (see e.g. Sorin 2002 p. 156, thm A. 7) 
and obtain that this game has a value and both players have optimal strategies. 
By lemma 12. 17} we get that v\ is concave, and by H5, we have V\ G T>. By claim 
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12.4} f° r every distribution u the game Ti(u) has a value which is precisely Vi. By 
concavity of V\, v\ is non decreasing. Since v\ G E\ and we use the Wasserstein 
distance, V\ is non expansive. □ 
We will need to consider not only the n-stage games T n (u), but a larger family 
of games with initial distribution u. 

Definition 2.19. Let 9 = Ylt>i^t be in A/(jPV*) ; i.e. 8 is a probability with 
finite support over positive integers. For u in Af(X), the game F\g](u) is the 
game with normal form (S,T, 7^]), where : 

too 
^9 t g(p t ,a t ,b t ) 
t=i 

If 9 = 1/n J2t=i T is nothing but T n (u). T^(u) can be seen as the game 
where after the play, a stage t* is selected according to 9 and then only the payoff 
of stage t* matters. If 9 = Ylt>i define 9 + as the law of t*— 1 given that t* > 2. 
Define arbitrarily 9 + = 9 if 9i = 1, and otherwise we have 9 + = Y2t>i T^^t- We 
now write a recursive formula for the value of the games r^](w). 

Proposition 2.20. For 9 = ^ t>l 9 t 8 t in Af(JN*) and u in Af(X), the game 
T[g](u) has a value V[o]{u) such that : 

WpeX, v m {p) = $(9i,v [e+] )(p), 

= max 9ig(p,a) + (I - 9i)v [e+] (l(p,a)), 

= niinmax 9ig(p, a, b) + (1 — 9i)V[g+](l{p, a)). 

In F\g](u), both players have optimal Markov strategies. V\g] is concave and belongs 
to T>. v\g\ is non decreasing and non expansive. 

Proof : by induction. If 9 = 5±, lemma [2.181 gives the result. 

Fix now n > 2, and assume that the proposition is true for every 9 with 
support included in {l,...,n — 1}. Fix a probability 9 = ^tLi^A> an d notice 
that 9 + has a support included in {1, n — 1}. Fix also p in X. 

Consider the auxiliary zero-sum game rj e j(p) with normal form (A,B,fJ d ^), 
where ff e Aa, b) = 9ig(p, a, b) + (1 — 9 1 )v[e+](l(p, a)). We will apply Sion's theorem 
to this game. By H3, A and B are compact convex subsets of topological vector 
spaces. For every a, (b i— *• f^{a, b)) is convex l.s.c. by H4. Consider now a convex 
combination Xa + (1 — A)a' in A. By H6, we have l(p, Xa + (1 — X)a f ) >z Xl(p, a) + 
(1 — X)l(p,a'). By the induction hypothesis, is non decreasing, so : 

V[e+] (l(p, Xa+ (1 - X)a')) > V[ 6 +] {Xl(p, a) + (1 - X)l(p, a')) , 

= Xv [e+] (l(p, a)) + (1 - X)v[ 8 +](l(p, a')). 

Since g is concave in a, we obtain that fL is concave in a. Regarding continuity, 
by H6 and the induction hypothesis, (a — > V[g+](l(p,aj) is continuous. By H3, 
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(a i — > g(p, a, b)) is u.s.c, so (a i— > /Si(a, 6)) is u.s.c. By Sion's theorem T'^(p) has 
a value which is : 

9i)v [e +](l(p, a)) 
0i)v [e+] (l(p,a)). 

Consider now the original game T[g](p), and a strategy pair (<j,t) in S x T. 
Write a = <Ji(p), resp. b = ri(p), for the first action played by player 1, resp. 
player 2, in T^(p). Denote by crj~ afe the continuation strategy issued from a after 
(p,a,b) has occurred at stage 1. cr^ ah belongs to E, and plays at stage n after 
(pi,ai,bi,...,p n ) what o plays at stage n + 1 after (p,a,b,pi,ax,bi,...,p n ). Similarly 
denote by r+ a b the continuation strategy issued from r after (p, a, b) has occurred 
at stage 1. It is easy to check that : 

7 f e] (a, r) = e l9 (p, a, b) + (1 - flih^fK^, r+J. 

Consequently, in the game T[g](p) player 1 can guarantee max a6 4 min^s @ig(p, a, b)+ 
(1 — 9i)v[ d +}(l(p, a)) by playing a Markov strategy. Similarly player 2 has a Markov 
strategy which guarantees min^s max ag A 9ig(p, a, b) + (1 — #i)?5[0+](Z(p, a)). Since 
the two quantities coincide, T[g](p) has a value V[g\{p) = v'^(p), and both players 
have Markov optimal strategies. 

This implies that for every u in Af(X), the game T^u) has a value which is 
the affine extension V[q](u), and both players have Markov optimal strategies in 
T^j(u). V[g] = $(6*1, and V[g+] is concave, so by lemma [2.171 is concave, 
and V[g] is non decreasing. By H5 is in V, so is 1-Lipschitz, and finally 
is non expansive. □ 

Among the games T[g](u), the following family will play an important role and 
deserves a specific notation. 

Definition 2.21. Form > and n>l, T m ^ n (u) is the game with normal form 

T i Im.ri) ■ 

Recall that in definition E35], we put 7^ n (<r, r) = E Fu i7 T (± Yn=m+i d(Pt,at,b t )) 
for each (cr, r). So r min (w) is nothing but r^w) with # = 1/n Y^=m+i^t- We 
can apply the previous proposition and denote the value of Y m ^ n {u) by v m ^ n {u). 
f ,n is just the value of the n-stage game v n , and for convenience we put v = 0. 
We have for all p in X, and positive m and n : 

v n (p) = $(l/n,i; n _i) = - max min(#(p, a, 6) + (n - l)5 n _i (Z(p, a)) ) , 
= — minmax (g(p, a, b) + (n — l)u„_x(Z(p, a)) ) , 

n be-B aeA 

v m , n (p) = $(0,t> m _i,n) = max t> m _i jn (7(p, a)). 

aeA 

In r m)Tl (u), the players first play m stages to control the state, then they play 
n stages for payoffs. Moreover, player 2 does not control the transitions, so he 
can play arbitrarily in the first m stages. 



v [9](p) = maxmin0i<?(p, a, b) + (1 — 

= minmax 6*i q(p, a, b) + (1 — 
fees aeA v ' y 
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Lemma 2.22. Fix n > 1. There exists a Markov strategy r = (Tt)t>i for player 
2 such that Vm > 0, W = (r/) 4 >i in T : 

the condition (V/ = l,...,n, \/p G X, t^ +z (., ...,p) = 7j(p)) implies that for 
every u in Af(X), r' is an optimal strategy for player 2 in T m ^ n (u). 

Proof : For each t in {l,...,n}, define r t as the mapping which plays, if the 
current state is p G X, an element b in B achieving the minimum in : 

min max — - (g(p, a, b) + (n - t)v n - t (l(p, a))) = v n+1 - t (p). 

beB a&A n — t + 1 

Using the previous recursive formula, one can show by induction that this construc- 
tion of r is appropriate. □ 

2.3.2 Player 2 can guarantee v*(u) in T(u). 

We now consider the game with infinitely many stages T(u). The following 
results are similar to propositions 7.7 and 7.8 in Renault, 2006. 

Proposition 2.23. In T(u), player 2 can guarantee with Markov strategies the 
quantity : 



inf n >ilimsup — ^ 



t=0 



Proof : Fix n > 1, and consider r lr .., r ra given by lemma |2.22[ Divide the set of 
stages IV* into consecutive blocks B 1 , B 2 ,..., B m ,... of equal length n. By lemma 
12.221 the cyclic strategy r' = (t\, r n , r 1; r n , r 1; r n , ....) is optimal for player 
2 in the game r nm „(w), for each m > 0. r' is a Markov strategy for player 2, and 
for every strategy a of player 1 in E we have : 



Vm > 0, JEjp uaT , ( - ^2 9(pt,(k,bt) J < v nmin (u), 

(j nM \ ^ M-l 

t=l / m=0 

And since n is fixed and payoffs are bounded, we obtain that player 2 can gua- 
rantee with Markov strategies : limsup M ^-p Ylm=l m r( M )- ^ 
This proof also shows the following inequality. 

Lemma 2.24. Vn G Aj(X), Vn > 1,VT > 1, u nT («) < ^ J2j=o v n t,n( u )- 

The following quantity will turn out to be the value of T(u). 
Definition 2.25. For every u in Af(X), we define : 

v*(u) = inf n >isup m>0 f; miri (>). 
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Since v*(u) > inf n >x limsup T J2t=o v nt,n(u)j, by proposition 12.231 player 2 can 

also guarantee u *(u) with Markov strategies in T(u). By claim [2761 Tj(tt) = min{t> G 
iR, player 2 guarantees u in r(w) }, so we now have the following inequality 
chain : 

v(u) < liminf Vn(u) < limsup v N (u) < v(u) < v*{u). 

N N 



2.3.3 Markov strategies for player 1. 

By HI player 2 does not control the transition, so a Markov strategy a in- 
duces, together with the initial distribution u, a probability distribution iP nj(7 
over (X x A)°°, i.e. over sequences of states and actions for player 1. For u in 
Af(X) and o\ : X — > A, we denote by H(u, ax) the law of the state of stage 2 
if the initial distribution is u and player 1 plays at stage 1 according to <j\. We 
denote by G(u, 01) the payoff guaranteed by o\ at stage 1. And we also define 
the continuation strategy a + . 

Notations 2.26. 

G{u,a x ) =J2 P ex u (p)9(P,Vi(p)), and oi) = T, P ex u (p) l (P> a i(p)) G A f( x )- 
If a = (a t )t>i is in S M ; we write a + for the Markov strategy (cr t ) t > 2 . 

We now concentrate on what player 1 can achieve in T(u) and completely 
forget player 2. We use similar notations as in definition 12.191 

Definition 2.27. For 6 = J2 t>1 6A in A f (N*), u in A f (X) and a in S M ; we 

put : 

1 f e] {a)=E ]P ^ [J2 d tg(Pt,a t U =5>7fo(*)- 
\t>i J t>i 

For simplicity, we write 7^] instead of 7^] for the payoff induced at stage t. Clearly, 
we have for every t > 2 : 

u 1 \ H(u,ai)/ 4-n. 

T[t](o-) = Vij (O- 

Lemma 2.28. ifgfa) = min rgr Tfofo r). 

The proof is easy, the minimum on the RHS being achieved by a Markov stra- 
tegy t such that for every t and pt, T t (pt) achieves the minimum in b of the quan- 
tity g(pt, o~ t (pt), b). As a corollary of lemma |2~28| we obtain that sup CTgE A/7^ (cr) = 
sup CTgE Minf Tg r7[g](o", r). By proposition 12.201 Y^iu) has a value, so we get : 

Corollary 2.29. 

For every 9 in Af(]N*) and u in Af(X), sup (JgS i\/7^(cr) = v^{u). 

As in definition 12.211 we now specify notations for a particular class of proba- 
bilities. 
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Definition 2.30. For n> 1, m > ; u in Af(X) and a in S , we put 



\t=i J t=i 



Finally, we consider a situation where player 1 does not precisely know the 
length of the game. 

Definition 2.31. Fix m > 0, n > 1, and u in Af(X). We define : 

w m ,n( u ) = su Pa G £« min {lm,t( a )^ e {1, .., n}}. 

The mappings w m ^ n will play an important role in the sequel, while applying 
corollary 3.8 of Renault, 2007. We will show in corollary 12.381 that they are non 
expansive. To prove corollary 12.381 we will use the following lemma 12.341 and 
propositions I2.33[ 12.351 and 12.361 We start by defining an auxiliary gamqf[ 

Definition 2.32. For m > 0, n > 1, and u in Af(X), we define A(m,n,u) as 
the zero-sum game with normal form (E M , A({1, n}), f), where : 



n 

W e £ M W G A({l,...,n}), f(a,0) = J>7^)- 

t=i 



We will prove later that A(m,n,u) has a value which is w m ^ n (u) (see propo- 
sition [2]36] below). Notice that in general f(-,0) is not concave in o. However, we 
will show that it is concave-like in a, i.e. that : W, a", VA G [0, 1], there exists a 
such that \/9, f(a, 6) > A/(cr', 6 , ) + (l — A)/(cr", We start with a characterization 
for the partial order y. 

Proposition 2.33. Let u and v be in Af(X). Write u = J2 P ex u (p)$p- The 
following conditions are equivalent : 

(i) u y v , and 

(ii) For every p such that u{p) > 0, there exist S(p) > 1, X{, X p s ^ > 
and qf,...,q^^ in X such that : X^fia ^? = Yls=i ^sls = P> an< ^ v = 

The proof can be easily deduced from a theorem of Loomis (see Meyer, 1966, 
T26 p. 283), which deals with positive measures on X. Notice that condition (ii) 
can be seen as follows : u is the law of some random variable Xi with values in 
X, v is the law of some random variable X 2 with values in X, and we have the 
martingale condition : JE(X2\Xi) = X\. 

In general, 7w(cr) is not a non decreasing function of u. However, we have the 
following property. 



3 We proceed similarly as in section 6.2. of Renault, 2007. However the situation is more 
technical here, and it will not be possible to apply a standard minmax theorem to the game 
A(m, n, u). 
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Lemma 2.34. Let n > 1, u and v be in Af(X) such that u >z v . For every 
o G Yj M , there exists o' G S M such that : 

VtG{l,...,n}, jf t] (a')>^ t] (a). 

Proof : by induction on n. 

If n = 1, there exists a 1 such that 7m (c') = v\{u). Since v\ is non decreasing, 

Vi(u) > vi(v) > 7fi](o")- 

Fix now n > 1, and assume that the lemma is proved for n. Fix u and 
v in A/(X) with u >z v, and a in E M . We have u = J2 P ex u (p)&pi an d by 

proposition 12.331 it is possible to write v = J2 p( z X u (p) (Sfifi A^gf) > with A^ > 0, 

^2s=i A? = I; an d Ef=i A?<?s = P f° r eacn P suc h that u(p) > 0. Define for every 
such p and s, a p s = cxi(gf). By the splitting hypothesis H7, for every p one can 
find a p G A such that : 

S(p) S(p) 

Z(p,a")^X>^'<) and ^« P )>E A ^(€X)- 

s=l s=i 
We define what a' plays at stage 1 if the state is p as : o-[(p) = a p . 

We have : ^(a') = 52 p u(p)g(p,a?) > £ p u(p) ^2 Afr(<£, a?) = 7^), 

and H(u, a[) = £ p u(p)i(p, a*>) b £ p «(p) ES ^ /(<£, a?) = JJ (v, 

Since if (it, ci) >: if(t> , o"i), we apply the induction hypothesis to the continua- 
tion strategy o~+. We obtain the existence of some Markov strategy r = (r t )t>i 

such that : Vt G {l,...,n}, 7j5 ( " ,,Ti) (r) > 7 ^ (lV7l) ((X + )- Define <r| = r t _i for 
each t > 2. a' = (oi)t>i is a Markov strategy for player 1, and satisfies : 
7[i]K) > 7fi](<^ and for t G {2, n + 1} : 

7ft (O = 7^f 1 V + ) > 7^r i) (^) = 7ft(*)- □ 

Lemma 12.341 is now improved. 

Proposition 2.35. Let n > 1, A G [0,1], w, w' and -u" m Af(X) be such that 
u >z Am' + (1 — A)w". For even/ a' and a" in £ M , t/iere exists a G S M stzc/i that : 

Vt G {1, n}, 7ft W > A 7 ftV) + (1 " A)7ft"(0- 

Proof : by induction on n. 

If n — 1, there exists a such that 7m (c) = > Si (An' + (1 — A)m") 

= Aui(u') + (1 - A)«iK) > A7j'](ff / ) + (1 - A) 7 fjM- 

Assume the proposition is proved for some n > 1, and fix w, u', u", A, </ and 
a" with m ^ Aw' + (1 — X)u". Write v = Xu' + (1 — X)u". By lemma [2 .341 it is enough 
to find a in S M such that : Vt G {l,...,n+l}, 7 «(<7) > A 7 g(>') + (1 - A) 7 ^"(cr")- 

We have v = ^ p (Ati'(p) + (1 - A)u"(p)) S p , and v(p) = \u'(p) + (1 - A)w"(p) 
for each p. For every p such that f (p) > 0, we define : 

A^ 7 (p) , (1 - A)m w (p) „ 

ffi(p) = p + — a M- 

v(p) v(p) 
15 



<Ji(p) belongs to A by convexity. Now, 

X^(a') + (1 - AfrftV) = \"£Ap)9(p,*'M) + (1 - A) £ u"(p)g(p, a'l(p)) 



- E»w(^fe»i(p))+ i m^fe«fO')) 

< $^(p) g(p,<?i(p)) = 7fi](^), 
p 

where the inequality comes from the concavity of g in the variable a (see H4). 
Proceeding in the same way with distributions on the second state, we obtain via 
the concavity of I in a (see H6) : 

XH(u',a[) + (1 - \)H{u\a'l) = X^WMip)) + ( l ~ A) ^ u"(p)l(p, <r"(p)) 

v v 

- X» (^W>&>)> + *^pW<i>)>) 

< ^v(p) l(p,ai(p)) = H(v,a) 



Consequently, we have H(v, a) >z \H(u', a[) + (1 — X)H(u", a"), and by the induc- 
tion hypothesis there exists a + in S M such that V£ G {l,...,n}, r )^ v,ai \o' + ) > 

A 7 ^ ( " Vl V + ) + (1 - A)7 [ ^ (u ' V ' / V ,+ )- We naturally define a = and 

we have, for t G {2,...,n+ 1} : 7^) = 7 [ ?5pV) > A 7 £SJ' + (1 - 

A)7S' V "V ,,+ ) = A 7 ftV) + (1 - AhftV)- □ 

Proposition 2.36. For every m > 0, n > 1 andu in Af(X), the game A(m, n, u) 
has a value which is w m>n {u). 

Proof : Recall that the payoff function in A(m, n, u) is : f(a, 9) = Ylt=i ^tlm,t( (T ) 
for every a in E M and 9 in A({1, n}). 

A({l,...,n}) is convex and compact, and / is afflne continuous in 9. We 
now show that / is concave-like in a. Let a', a" be in E M , and A G [0, 1]. By 
the previous proposition, there exists a in E M such that : Vt G {l,...,m + 
n}, 7 jJ(<7) > A 7 ^K) + (1 - X)^ t] (a"). So /M) = E"=i $ E*=i > 

EJU f ELi (A7r„ t+ ,]K) + (1 - A)7^K)) = A/(a-',0) + (l-A)/K,0). By 
a theorem of Fan (1953, see proposition A. 13 p. 160 in Sorin 2002), A(m, n, u) has 
a value which is : sup CTgS Minf ee A({i,...,n})/(o", 9) = w m ^ n (u). □ 

Notation 2.37. For 9 = £" =1 t <*t G A({1, n}), and m > 0, we define 9 m ' n 
in A({1, m + n}) by : 

gm,n = if s < m> an( l Qni,n = YJl^^ f if ' 771 < S < n + 771. 
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Corollary 2.38. For every m > 0, n > 1 and u in Af(X), 

W m ,n(u) = mfeeA({l,...,n})V[9™,"](u). 

The mapping w m ^ n is non decreasing and non expansive. 

Proof : Fix m, n and u. w m ^ n (u) is the value of A(m, n, u), so we have : w myn (u) = 
infe e A({i,...,n})Sup CTeE Af /(>,#). But f{a,6) = e tlm,t{ a ) = 7[^ .n]0). So we ob- 
tain : w mjn {u) = inf 0eA ({i ! .... in}) sup (J6S M7^ m , n] (cr). Corollary [MS gives w m<n {u) = 
infe e A({i,....,n})'5[9 m "](w). For each 9, V[$m, n j is non decreasing and non expansive by 
proposition 12.201 hence the result. □ 



2.3.4 A dynamic programming problem. 

We will conclude the proof of theorem 12.161 and show that the uniform value 
exists. Fix the initial distribution u, our stochastic game is defined by T(u) = 
(X, A, B, g, I, u). We now define an auxiliary MDP as follows. 

Definition 2.39. The MDP *(z ) is defined as {Z,F,r,z Q ), where : 

• Z = A f (X) x [0, 1] is the set of states of the MDP, 

• Zq = (u, 0) is the initial state in Z , 

• r is the payoff function from Z to [0, 1] defined by r(u, y) = y for each (u, y) in 
Z, 

• and the transition function F is the correspondence from Z to Z such that : 
Vz = (u,y) E Z, 

F ^ = \ /^ U (P) 1 (P> a (p))>^2 u (p)9(P, o(p)) > Vpa(p) G A 
{ \pex pax J 

= {(H(u, /), G(u, /)), f is a mapping from X to A}. 

Notice that F(u,y) does not depend on y, hence the value functions will not 
depend on y. F has non empty values. Even with strong assumptions on / and g, F 
may not have a compact graph, because in the definition of F(z) we have a unique 
a(p) for each p. So even if q is close to p, the image by F of (l/25 p + l/25 9 , 0) 
may be quite larger than F(6 P , 0). 

As in Renault, 2007, we denote by S(z ) = {s = (z±, z t , ... 
l,z t G F(z t _i)} the set of plays at z . The next proposition shows the strong 
links between the stochastic game T(u) and the MDP ^(zq). 

Proposition 2.40. a) For every Markov strategy a in S M , there exists a play 
s = (zi, Zt, ...) G S(z ) such that Wt > 1, 7w(c) = r(z t ). 

b) Reciprocally, for every play s = (z%, z t , ...) G S(z ), there exists a Markov 
strategy a in S M such that : Wt > 1, 7w(c) = r(z t ). 
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Proof : a) Take a Markov strategy a = (<Tt)t>i in S M . Put U\ — u and y = 0, 
so that Zq = (ui,yo). Define by induction, for every t > 1, u t+ \ = H(u t ,cr t ), 
y t = G(u t , a t ), and z t = (u t+1 , y t ) G F(z t _ 1 ). s = {z t ) t >i is a play at z . 

7fr(<r) = G(u )( t0 = i/! = r(*0, and for t > 2, 7 «(<r) = 1 f t ^ 1 \a+) = 
= 7|t 2] (^ ++ ) = - = 7pj((*i0*>*) = G(u t ,a t ) =y t = r(z t ). 

b) Take a play s = (z t )t>i at ^o- Write for each t > 0, z t = (u t +i, yt) G A/(X) x 
[0,1]. For every t > 1, there exists a mapping / t from X to A which defines 
z t = (u t+1 ,y t ) in terms of z t _i, i.e. that u t+1 = H(u t ,f t ), and y t = G(u t ,f t ). 
Simply define a = (ft)t>i- As in point a), one can check that 7m(c) = r(z t ) for 
each positive t. □ 

For any m > 0, n > 1, and s = (^)t>i in we put as in definitions 3.1. 
3.2. of Renault, 2007 : 

7m,n( s ) — ^ S"=l r ( z m+t)) 

^m,n(s) = nain{7 mjt (s), t G {1, n}}, 
v m , n (zo) = sup se5(zo) 7 m>n (s), and 

W mi n{z ) = SUp g g( )Z/ m , n (s). 

By proposition 12.401 and corollary 12.291 it is easy to obtain for every m and 
n, the equality of the values in the stochastic game T(u) and in the MDP ^(z ) : 
v m ,n(u) = v mjn (z ). As a consequence, we also have v*{u) = inf n > 1 sup m>0 ?) min (u) = 
v*(zq) (see definitions 3.6 of Renault, 2007 and 12.251 here). Similarly, we have 
w m ,n{ z o) = su PaGSM min {7m,t (°") ) ^ ^ {1, = w m>n (u) (see definition [2,311) • 

Define now v M (u) as the maximal quantity that can be guaranteed by player 
1 in r(-u) with Markov strategies : 

v M (u) = sup ffeS M liminf (inf T6T 7*(<7,r)) = sup CTeSM liminf 7" (a). 

n ~ n 

Recall that the lower value of the MDP is defined by : 

v(z ) = sup (zt)t ^ ieS{zo) (liminf^ ELi r (^)) ■ A g ain > proposition [M0J gives 
the equality v M (u) = v_(zq), so that we have the following relations : 

v(z ) = v_ M {u) < v(u) < liminf jy Vn(u) = liminf jv Vjy(z ) 
< limsupjy vn(zq) = limsup^ vn(u) < v{u) < v*{u) = v*(z ). 

We can now conclude. We use the Wasserstein distance on Af(X), so Z na- 
turally is a precompact metric space. For every m and n, by corollary I2.38[ 
{u 1— > w mtn (u)) is a non expansive mapping from A/(X) to [0, 1]. This implies that 
(zo 1 — > w mtn (zo)) is a non expansive mapping from Z to [0, 1]. As a consequence 
the family (w m>n )m>o,n>i is uniformly continuous. By corollary 3.8 of Renault, 
2007, we obtain that the MDP ^(zq) has a uniform value which is : 

v*(z ) = v(z ) = \im N v N (z ) = sup m > inf n >i«7 min (2: ) = sup m > inf n >it> m ,„(2:o). 

And the convergence from (v n ) to v* is uniform. Back to our stochastic game 
r(w), we obtain that (v n ) n uniformly converges to v*, and 

v*(u) = v M (u) = v(u) = \im N v N (u) = v(u). 
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So v(u) = v(u), which implies that T(u) has a uniform value. Moreover, v*(u) = 
sup m>0 inf n >i«; min (ii) = sup m>0 inf n >iu mj „(w), and using definition 3.6 of Renault, 
2007, v*(u) = inf n >! sup m>0 w m)n (-u) = inf n > 1 sup m>0 , i; m)n ('u). This concludes the 
proof of theorem 12.161 

□ □ □ □ 

2.4 Comments. 

Remark 2.41. Player 2 has 0-optimal strategies. 

Under the same hypotheses HI,..., H6, it is possible to slightly modify the proof 
of Proposition \2~23\ and obtain that player 2 has a Markov strategy r which is 0- 
optimal in T(u), i.e. such that : Vt > 0, 3N ,VN > iV , Va G S, 7^(ct, r) < v + s. 

Divide the set of stages into consecutive blocks B 1 ,..., B m such that B m has 
cardinal m for each m. By lemma \2.22\ there exists a Markov strategy t = (r t ) t >i 
with the property that for every m > 0, r plays optimally within B m , in the sense 
that t is an optimal strategy for player 2 in r m ( m _i)/2, m ( , u). For every strategy a 
of player 1, we have : 

I 1 max(B m ) \ 1 m 

Vm > 1, Bp — — — V g(pt,<H,bt) < — V%-i)/2,i(«)- 



t=i / i=i 



We have seen in subsubsection \2~3^4 that the values in the stochastic game T(u) 
are the values of the MDP ^(^o); so we can apply lemma 3.4 of Renault, 2007 : 
Vi > l,Vfc > 1, %_i)/2,iO) < sup^wi^u) + 

Fix now e > 0. One can find k such that swp l>0 wi } k(u) < v*(u) + e. Since 
— ^i-^oo 0, one can find m such that for every m > m , 



E 



1 max ( Bm ) \ 1 m / k-l\ 

9(Pt,at,b t ) ) <-^2(v*(u)+e + ^- 1 <v*(u)+2e. 



l)/2 



Looking at the size of the blocks, one can show that t is 0-optimal for player 2. □ 



Remark 2.42. A simple hypothesis implying H5. 

Recall that hypothesis H5 requires the existence of some subset T> of E\ which 
contains $(1, 0), and is stable under any .). The following hypothesis is sta- 
ted in terms of the mappings g and I. The distance d on X is extended to A(X) 
by the Wasserstein distance. 

Hypothesis H5' : Vp G X,Va 6 A } Vp' G X, 3a' G A such that : 

d(l(p,a),l(j/,a')) < d(p,p r ), and mf beB g(p',a',b) > inf beB g(p, a, b) -d(p,p'). 

It is easy to check that H5 7 implies that E\ itself is stable under any $(a, .) 
so H5' implies H5. Consequently, the conclusions of theorem \2.16i are true if 
H1,H2,H3,H4,H5',H6, HI hold. 
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3 Repeated games with an informed controller 



3.1 Model 

We first consider a general model of zero-sum repeated game. We have : 

• five non empty finite sets : a set of states or parameters K, a set 
I of actions for player 1, a set J of actions for player 2, a set C of 
signals for player 1 and a set D of signals for player 2. 

• an initial distribution it e A(K x C x D), 

• a mapping g from K x I x J to [0, 1], called the payoff function of 
player 1, and 

• a mapping g from K x I x J to A(K x C x D), called the transition 
function. 

Initially, (ki,Ci,di) is selected according to it, player 1 learns C\ and player 2 
learns d±. Then simultaneously player 1 chooses i\ in / and player 2 chooses j\ 
in J. The payoff for player 1 is g(ki, then (/c 2 , C2, o^) is selected accor- 

ding to q(ki, etc... At any stage t > 2, (k t ,c t ,d t ) is selected according to 

q(kt-i, it-i, jt-i), player 1 learns c< and player 2 learns dt- Simultaneously, player 
1 chooses i t in / and player 2 chooses j t in J. The stage payoffs are g(k t , it, it) f° r 
player 1 and the opposite for player 2, and the play proceeds to stage t + 1. 

From now on we fix T = (K, I, J, C, D, g, q), and for every tc in A(K x C x 
D) we denote by r(7r) = (K, I, J, C, D, n, g, q) the corresponding repeated game 
induced by 7r. For the moment we make no assumption on T, so we have a general 
model including stochastic games, repeated games with incomplete information 
and imperfect monitoring (signals). We start with elementary definitions and 
notations. 

Players are allowed to select their actions randomly. A (behavior) strategy 
for player 1 is a sequence a = (c t ) t >i, where for each t, at is a mapping from 
(C x J)* -1 x C to A(7), with the interpretation that a t (c 1: i 1: c t _i, i t -\, c t ) is 
the lottery on actions used by player 1 at stage n after (ci, i±, c t _i, i t _i, c t ). 
o"i is just a mapping from C to A(7) giving the first action played by player 1 
depending on his initial signal. Similarly, a strategy for player 2 is a sequence 
T = ( T t)t>i, where for each t, r t is a mapping from (D x J)* -1 x D to A(J). We 
denote by £ and T the sets of strategies of player 1 and player 2, respectively. 

It is standard that a pair of strategies (<r, r) induces a probability lPn,a,T on 
the set of plays f2 = (K x C x D x I x J)°° , endowed with the cr-algebra generated 
by the cylinders. 

Definition 3.1. The payoff for player 1 induced by (<r, r) at the first N stages is 
denoted by : 
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For 7r in A(K x C x D) and N > 1, the A^-stage game game 1^(71") is the 
zero-sum game with normal form (E,T, 7]^). By Kuhn 's theorem, T^(tt) can be 
seen as the mixed extension of a finite game, so it has a value v n (n). 

The following definitions are similar to those of subsection 12.11 or section 2 of 
Renault, 2007. 

Definition 3.2. Let it be in A(K x C x D). 

The liminf value of is : v~(it) = liminf v n (7r). 

n 

The limsup value of T(ii) is : v + (n) = limsup v n (n). 

n 

The lower (or maxmin) value o/T(7r) is : 

v(ir) = sup CTgE liminf (inf rer 7^((j, r)) . 

n 

The upper (or minmax) value ofT{u) is : 

v(w) = inf reT limsup (sup ffeS 7^((T, r)) . 

n 

We have v(tt) < v~(tt) < v + (tt) < v(7r). 

r(7r) is said to have a uniform value if and only if v(tt) = v(tt), and in this case 
the uniform value is v(n) = v(7r) . 

An equivalent definition of the uniform value is as follows. Given a real number 
v, we say that player 1 can guarantee v in r(7r) if : We > 0, 3a e S, 3N Q ,WN > 
N , Vr G T, 7]^ (cr, r) > v — e. Player 2 can guarantee v in r(7r) if : We > 0, 3r G 
T, 3Ao, V7V > A^o, Vcr G S, 7^(0-, r) < v + e. If player 1 can guarantee v and player 
2 can guarantee w then clearly w > v. We also have, exactly as in claim l2~6l : 

Claim 3.3. v(tt) = ma,x{v G M, player 1 can guarantee v in r(7r) }, 
v(tx) = min{f G M, player 2 can guarantee v in r(7r) }. 
A real number v can be guaranteed by both players if and only if v is the 
uniform value ofr(n). 

We now consider hypotheses on q and tt. 

Hypothesis HA : Player 1 is informed of everything, i.e. at any stage t > 1, he 
can deduce from his signal c t : the state k t , player 2's signal d t , and if t > 2, he 
can also deduce from c t the action j t _i previously played by player 2. 

Hypothesis HB : Player 1 fully controls the transition, i.e. q(k,i,j) does not 
depend on j for each (k, i) in if x /. 
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HA and HB are very strong hypotheses, and they are incompatible as soon as 
J has several elements. We will use weaker hypotheses. 

Hypothesis HA' : Player 1 is informed, in the sense that he can always deduce 
the state and player 2's signal from his own signal. Formally, there exists two 
mappings k : C — > K and d : C — > D such that, if E denotes {(k,c,d) G 
K x C x D, k(c) = k and d(c) = d}, then : 

tt(E) = 1, and q(k,i,j)(E) = 1, V(k,i,j) e K x I x J. 

Notice that HA' does not mean that player 1 knows everything. Since we did 
not include the signals in the move, knowing the signal d t of player 2 at some 
stage t does not imply knowing the action j t by player 2. However, not knowing 
this action will not be a problem for player 1 because we will also assume that 
player 2 does not really influence the transitions. 

Hypothesis HB' : Player 1 controls the transition, in the sense that the margi- 
nal of the transition q on K x D does not depend on player 2's action. For k in 
K, i in I and j in J, we denote by q(k, i) the marginal of q(k, on K x D. 

Assume that HA' and HB' hold. The couple (new state, signal of player 2) 
is selected according to a distribution depending on the current state and player 
l's action, but not depending on player 2's action. Player 2 may influence the 
distribution of player l's signal, but still player 1 will be able to deduce the state 
and player 2's new information on the state. So essentially player 2 can influence 
player l's knowledge about player 2's action. But this information is not relevant 
because it does not affect player 2's belief on the future states. 

Theorem 3.4. Under the hypotheses HA' and HB 7 , the repeated game r(vr) has 
a uniform value. 

The next subsection is devoted to the proof of theorem 13 .41 See subsection 13.31 
for other comments on hypotheses, applications and open questions. 

3.2 Proof of theorem 13.41 

We assume in this subsection that HA' and HB' are satisfied. Keeping fixed 
all other quantities, increasing the set C of signals for player 1 has no influence 
on the existence of the uniform value, so in the sequel we will assume w.l.o.g. 
that : The mapping ( c — ► (k(c), d(c)) ) is a surjection from C to K x D. 

We put X = A(if). An element u in Af(X) is written u = Yl P ex u (p)$p- ^ s 
in the previous section, we use the Wasserstein distance, and the (reverse of ) the 
Choquet order on A(X). Wu G A(X),Vu G A(X), d(u,v) = sup f. E _^ ]R1 __ Lip \u(f) — 
v(f)\. And we write u >z v iif for every continuous concave real valued mapping 
/ defined on X , u(f) > v(f). 
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If S is a finite set, we use the norm on M s . The set of probability distri- 
butions A(S) is viewed^ as a subset of IR S . 

3.2.1 Value of finite games. 

As in definition 12.191 we need to consider a large family of finite games. 

Definition 3.5. Let 9 = Ylt>i ^tfit be ^ n AfC^"*)> ^ ^ s a probability with finite 
support over positive integers. For tt in A(K x C x D), t/je game r [01(71") is the 
game with normal form (£, T, 7wi), where : 

jf e] ((T,T) = Ejp^ r (y2 6 t 9(h,h,jt)j ■ 

Particular cases : if 9 = 1/n Ylt=i^' ^ld}( n ) ^ s nothing but F n (n). 

For m > and n > 1, we denote by T mn (ii) the game T^](n) where 9 = 
1/n Yl!t=m+i The payoff function is written in this case : 7^ n ((i, r). The value 
ofT m ^ n {rr) will be denoted by 

Notice that t> n is just v n , the value of the n-stage game. The following lemma 
is true without the hypotheses HA' and HB' . 

Lemma 3.6. For every 9 £ A/(IV*) and tt £ A(KxCxD), the gameT^(n) has 
a value, denoted by v^tt), and both players have optimal strategies. Moreover, 
Vm is a non expansive mapping from A(K x C x D) to M. 

Proof : The existence of the value and optimal strategies is standard. Notice 
that for every 9, tt, and strategy pair (a, r) : 

ki,ci,di 

Since we use ||.||i, is 1-Lipschitz. □ 

Definition 3.7. We define a mapping \1/ from A(K x D) to Af(X) by : for 
each probability tt on K x D, \T/ (7r) = J2deD ^(3)5^, where for each d, ix d is the 
conditional probability on K given d issued from tt. 

Notation 3.8. Let tt be in A(K x C x D). We denote by tt the marginal of 
tt on K x D, and denote by tt the probability induced by tt (or tt) on X , i.e. 
tt = ^(tt) = 'Yj deD Tr{d)5 7 ,d £ Af(X), where for each d, TT d is the conditional 
probability on K given d issued from tt (or tt). 

We also put A E = {tt £ A(K x C x D),tt(E) = 1}, where E = {(k,c,d) £ 
K x C x D, k(c) = k and d(c) = d}. 

4 Notice that if we put d(s, s') = 2 for any distinct elements of S, then for every p and q in 
A(S) we have sup /:5 ^ Ra _ Lip | J2 s P s f( s ) ~ T, s 1 s f( s )\ = \\p-q\\i- 
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Lemma 3.9. Let n and n' be in A E such that it = it'. Then v^(ir) = v^n') for 
each 9. 

Proof : Fix 6 in A/(IV*), ir in A E , and a strategy pair (a, r). 

di ki c\ 

Since tt(E) = 1, player 1 can deduce d\ and k\ from Ci, so : 
sup <reE 7[9](^,T) = X^T(di) ^7r(fci|di) J^7r(ci|di, fci) sup CT6E 7p ( ] fcl ' cl ' dl V, r). 

di fci ci 

But sup . eE 7^ fcl ' Cl ' dl) (o", r) does not depend on c±, so for any fixed c* in C, 

sup CT6S 7[e]( '> r ) = Xl^^ 1 )5Z 7r ( A;i l dl ) sup ^s7^ 1 ' c *' dl) (^^)- 

Consequently, sup (TgS 7^] (a, r) only depends on n, r and 0, and V[g](ir) = inf Tg r 
sup creS 7^ (a, r) only depends on n and 9. □ 

Remember that we assumed w.l.o.g. that the function (k , d) appearing in 
hypothesis HA' is surjective. It will be convenient in the sequel to use the following 
notation. 

Notation 3.10. For any (k,d) in K x D, we fix an element c(k,d) in C such 
that k(c(k, d)) = k and d(c(k, d)) = d. 

V[0] has been defined as a mapping from A(K x C x D) to IR. We now define 
value functions with domain X and A(X). Let p be in X. We define 7r in A E as 
follows : fix d* in D, and 7r chooses, for each k in K, the element (k, c(k, d*), d*) 
with probability p k . Then n = 8 P , so by the previous lemma v\q]{tt) only depends 
on p. We thus define : 

V[0](P) = v [0]( n )- 

With a slight abuse of notation, v\e\ now also denotes a mapping from A{K) to 
JR. And we have for each ir in A E : 

v m (n) = inf TGr sup CTeS 7^((7,r), 



2 7r(di) inf 




So we have obtained the following. 
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Lemma 3.11. 

Vtt g A e ,W g A f (lN*), v [e] (Tr) = Adi)v ie] (7r dl ). 

diED 

Notation 3.12. vm denotes the affine extension of Vm on A(X), i.e. : Vm G 
A(X),v [e] {u) = f peX v [e] (p)du(p). 

From the previous computations, vm is clearly linked to the original value 
function V[g\. 

Claim 3.13. 

Vtt G A e , v [0 ](7c) = V[e\(Tt). 

So from the knowledge of V[e] on X, one can deduce its extension V[e\ on A(X) 
and then the original value function vw, on A E . We have, for each p in A(K) (and 
for any d* in D) : 

V[e](p) = inf rer (j2k 1 P kl su P<76s7^ fel ' c( ' = ' d * ) ' d * ) (^^)) • So v [e] is a concave and 
non expansive mapping from A(K) to JR. Consequently, vm is a non decreasing 
and non expansive mapping from A(X) to M. 

We finally define : 
Definition 3.14. For it in A(K x C x D), we put : 

v*{n) = inf n >isup m > f mi „(7r). 

3.2.2 An auxiliary stochastic game. 

We now introduce a stochastic game with complete information, to be played 
in pure strategies, as in section [21 

Definition 3.15. Recall that X = A(K). We put A = A(I) K and B = A(J), 
and define for every p in X , a in A and b in B : 

g(p,a,b) = ^p fc ^^a fe (i)6(jXfc,i,j), 
keK iei jeJ 
g(p,a) = mf beB g(p,a,b), 

Q(p,a,b) = Yl p k a k (i)b(j)q(k,i,j)eA(KxCxD), 

(k,i,j)£KxIxJ 

Q(p,a) = P k a k (i)q(k,i) G A(K x D), 

(k,i)eKxI 

l(p,a) = *(Q(p,a)). 

g is a mapping from X x A x B to [0,1], whereas I is a mapping from X x A to 
A f (X). 

For u in Af(X), we write T(u) for the stochastic game (X, A, B, g,l,u) with 
initial distribution u. 
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By hypothesis HB', it is easy to see that the marginal of Q(p, a, b) on A(K x D) 
does not depend on b, and precisely is Q(p, a). l(p, a) is nothing but 
Scte-D Q(p> a ){d) ^Q{p, a ) d i where for each d, Q(p, a) d is the conditional probability 
on K given d issued from Q(p,a). Notice also that for every (p,a,b), Q(p,a,b) 
belongs to the convex set A E . 

Suppose that in the original game r(7r), the current state is selected according 
to p, player 1 knows k and plays the mixed action a k G A (I), whereas player 2 
just knows p and plays the mixed action b G A (J). Then g(p, a, b) is the (ex- ante) 
expected payoff for player 1, and l(p, a) is the (ex- ante) distribution of player 2's 
future belief on the next state. 

We will eventually apply theorem 12. 161 to r(7r), so we have to check the hypo- 
theses HI to H7 of section [2j HI, H2, H3 and H4 are clearly true. We now need 
the following properties of the mapping 

Lemma 3.16. ^ is concave : 

W,7r" G A{K x .D),VA G [0,1], *(Att' + (1 - A)tt") h A^(tt') + (1 - A)*(tt"). 

Proof : Write vr = A7r'+(1-A)7r". Notice that for each d in D, n d = ^(\n'(d)n ,d + 
(1 — X)tt'' '(d)ir" d ) . Let / be a concave continuous mapping from X to M, we have 
to show that #(tt)(/) > A*(tt / )(/) + (1 - A)*(tt")(/). 

A*(tO(/) + (1-A)*(tO(/) = A^ 7 r'(rf)/(vr' rf ) + (l-A)^vr"(rf)/(vr" d ), 

d d 

d 

For any p in X, the marginal Q(p,a) is affine in a, so we obtain that l(p,a) = 
ty(Q(p, a)) is concave in a. Hypothesis H6 will then immediately follow from the 
next lemma. 

Lemma 3.17. ^ is continuous. 

Proof : Let (7r„) n be a sequence in A(K x D) converging for the norm ||.||i to ir. 
It is easy to see that for every / continuous, \?(7r n )(/) = J2d n n(d)f(7i d ) converges 
as n goes to infinity to J2d 7l (^)f( 7ld ) = ^ / ( 7r )(/)- ^ 

Remark 3.18. One can show that \1/ is Lipschitz, but it is not 1-Lipschitz when 
II. ||i and the Wasserstein distance are used. 
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Lemma 3.19. "Splitting hypothesis" H7. Consider a convex combination p = 
S s =i ^sPs in X , an d a family of actions (a s ) s€S in A s . Then there exists a in A 
such that : 

l(p, a) >z 22 ^sl(Ps,a s ) and g(p, a) > 2jA s g(p s , a s )- 

ses ses 

Proof : Define a : K — > A(i) with the well known splitting procedure of 
Aumann and Maschler : observe k in K which has been selected according to 
p, then choose s with probability X s p k /p k , and finally play a k s . Formally, put 
a k = Ylses ~^ a s if p k > 0, and define arbitrarily a k if p k = 0. We have : 



Q{p,a) = ^^p*a*(i)g(M), 
keK iei 

ses keK i&l 



ses 



So by concavity of we have l(p, a) y \ s l(p s , a s ). 

Regarding payoffs, we have for each b in B, g(p,a,b) = J2 S ^sg(p s , ci s , b), so 
inf beBg(p, a, b) > XUs ^si^ beB g(Ps, a„, b). □ 

Up to now, only H5 remains to be proved. 



3.2.3 The recursive formula. 

We now prove a standard recursive formula for the value functions. As in 
definition 12.191 for 9 = Ylt>i @t$t we define 9 + as the law of t* — 1 given that 
t* > 2, so that 6 + = J2 t >i T=eT 5 * if ^ ^ x > and 9+ is defined arbitrarily if 6 1 = I. 
Proposition 3.20. For 9 in A/ (IV*) and p in X, 

vw](p) = maxmin (9 1 g(p,a,b) + (1 - 9 1 )v [e+] (l(p, a))) , 

aeA beB v 

= niinmax (9ig(p, a, b) + (1 — 9i)v[Q+](l{p, a))) . 

For every tt in A E , in the game rW7r) both players have optimal strategies only 
depending on ft G A (if x D). 

Proof : By the proof of lemma 13.91 we know that for any r in T and fixed c* in 
C, 

And r is optimal in T [g] (tt) if and only if J2 dl ^(^l) J2ki ^(^iMO su Po-esT^f 1,C *' dl) r ) 
= J2d 1 7r(^i)inf r'er n(h\di) sup <TgE 7 [e ( ] fcl,c *' dl) ((T, t'). Hence player 2 has an op- 
timal strategy in Yw(ir) that only depends on 7f. 
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We now show that player 1 has an optimal strategy in r [01(71") that only de- 
pends on 7r. For every d\ in D, fix an optimal strategy a{d\) of player 1 in the 
game T [e] (Y, kl eK 1x { k i\ d i)^{k l ,c{ki4i)4i)) ■ Define now a that plays after each ini- 
tial signal Ci exactly what a{d{c\)) plays after the initial signal c(k(ci),d(ci)). 
One can check that a is optimal in any game r^](7r'), if ir' e A E and ir' = it. 

We now prove the recursive formula by induction on the greatest element in 
the support of 9. v\{p) = max agj 4 minft g £ g(p, a, b) = min^s max ag A g(p, a, b) is 
easy by Sion's theorem. 

Fix now n > 2, and assume that the proposition is true for every 9 with 
support included in {l,...,n — 1}. Fix a probability 9 = 5^™ =1 0^t, and notice 
that 9 + has a support included in {1, n — 1}. Fix also p in X. The equality 
max aej4 min bei j (9 1 g(p,a,b) + (l-9 1 )v [e+] (l(p,a))) = min bei3 max a6A (9ig( p,a,b ) + 
(1 — 9i)v[ 9 +](l(p, a))) is standard and similar to the proof of proposition 12.201 so 
the proof is omitted here. 

By definition, we have v [e] (p) = v [e] (n), where tt = Y.kP k &{k,c{k4*)4*) G A(if x 
C x D), and d* is an arbitrary element of D. We thus consider the game Y[q-\{t{). 
Let (a, r) be a strategy pair. Write a = (a k ) k = (ai(c(k, d*)))k in A = A(I) K for 
the action played by player 1 at stage 1, and write b = Ti(d*) E B = A(J) for 
the first stage action of player 2. We have : 

7fo(<r, r) = a, b) + (l- 9,) £ p*a*(<i)6(7 1 )7^ <1,, ' l) (< fc , i . )l4l , ^ J, 

fc,iiji 

where cr"t, ,„x . and rl ■ are continuation strategies. 

a being fixed, we can choose a + an optimal strategy for player 1 in the 
game rWi(Q(p, a, ji)), and this choice can be made independently of ji since 
the marginal Q{p,a,ji) does not depend on ji. We have for every ji in J : 

Y, kM P k ^ix)lf^ = 
V\o+] {f'iPi a ))- By playing a at stage 1, and then according to a + (for any first signal 
c(k,d*) and first action player 1 can thus guarantee : mfb8ig{p,a,b) + (1 — 
9i)v[o+](l(p, a)). So V[o](p) > m&x aeA mm beB (9 1 g(p,a,b)+ (1 - 9 1 )v [e + ] ( y l( y p,a))). 

We finally show that player 2 can defend max agA mm beB (9ig(p, a, b) +(1 — 
8i)v[e+](l{p,a))) in r^](7r). Fix a strategy a of player 1. a — (a%(c(k,d*)))k being 
fixed, choose b in B achieving mii,g(p,a,b). We also choose r + in T an optimal 
strategy for player 2 in the game r^+]((5(p, a, ji)), and this choice can be made 
independently of j\. t + now being fixed, notice that there exists a strategy a' of 
player 1 which is a best reply to r + in any game r [0+1 (71"'), with n' G A E . We now 
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have 



jf e] (a, r) = e l9 {p, a, b) + (1 - B x ) £ b{j x ) £ A*(<i)7$j 4u/l VJm*),^ 



ji k,h 



< o ig (p,a) + (i - e 1 ) Y,Kh) Xya*&)7 

ji k,ii 

= 9 ig (p, a) + (1 - 1 ) Kjih^'^ r + 



.a 



= Oig(p,a) + (1 - 6i) ^&Oi)u [0+] (Z(p,a)), 

< su PaeA (0i0(p, a) + (1 - (J(p, a))) 

So U[0](p) < max aeA mm beB (Oig(p, a, b) + (1 - 0i)u[0+](Z(p,a))). □ 

3.2.4 Player 2 can guarantee u*(7r) in r(7r). 

We first have an analog of lemma [2.221 Recall that a strategy for player 2 is a 
sequence r = (n, r 2 , r t , ...), where for each t, r< is a mapping from (DxJ) t ~ 1 xD 
to A(J). 

Lemma 3.21. For every n e A E ; n > 1 and m > 0, Vri, ...,r m , 3r m+ i, ...,r m+n 
swc/i t/iat any strategy of player 2 starting by n, r m , T m+n zs optimal in the 
game T mtn (ir). 

Proof : Fix 7T, m, n, and n, r m , with r t : (D x J) l ~ l x D — > A (J) for every 
t < m. Define 7"" as the set of strategies of player 2 that start with ri,...,r m . 
Let us now consider the zero-sum game r] nn (7r) with strategy set E for player 
1, 7~t for player 2, and payoff function (the restriction of) f m ,n- Stages greater 
than m + n do not care, so T| nn (7r) can be seen as the mixed extension of a 
finite game, where nature plays t%, r m instead of player 2 for the first m 
stages. Consequently, rj ran (7r) has a value which we denote by u^nC 71 ")- Clearly, 
v mn( 7[ ) — v m,n{^)- Now, for any strategy a of player 1, it is easy to construct, 
by the recursive formula of proposition 13.201 a strategy r that defends v mjn (ir) 
against a. So v^ff) = f m ,n(vr), and considering an optimal strategy of player 2 
in rj ran (7r) concludes the proof. □ 

Proposition 3.22. For each tt in A E , player 2 can guarantee v*(jr) in the game 

r(Tr). 

Proof : Divide the set of stages into consecutive blocks B 1 , B 2 ,..., B m of equal 
length n. Define a strategy r of player 2 as follows. At block B 1 , pick r 1; r 2 , r n 
in order to get an optimal strategy in ro, n (7r). At block B 2 , use lemma 13.211 to 
construct r n +i, ...,t 2 „ and get an optimal strategy also in T n ^ n (7i), etc... At block 
B m+1 , given Ti,...,r nm , use lemma l3~2T1 to define r nm +i,..., r n ( m +i) to get an optimal 
strategy in r nmi „(7r). 
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For any a and M > 1, we have : 

E **, V , T (ivkEt=i^(^^i*)) < fESW^) < sup m £ mi „(7r). So player 
2 can guarantee v*(n). □ 

We now have the following inequality chain, for 7r in A E : 

v(tv) < v~(tt) < v + (tt) < v(tt) < v*(tt). 



3.2.5 Player 1 can guarantee v*(n) via the auxiliary game. 

For / continuous and a in [0,1], we defined in notation 12.131 the mapping 
$(«,/) as follows : Wp G X, 

(p) = sup a6A inf 6eB ( a g(p,a,b) + (1 - a) f(l(p,a)) ) . 

We now simply define the following subset of mappings from X to M : 

V={v m , 9 G A/ (IV*)}. 

By the recursive formula, we obtain that Z> is stable under $ : V/ G P, Va G [0, 1], 
$(«,/) G P. Since v\ = $(1,0) G P and all elements of V are non expansive, 
the hypothesis H5 of section [2] is satisfied. 

Proposition 3.23. 

a) For every u in Af(X), the auxiliary game T(u) = (X, A, B, g, I, u) satisfies 
the hypotheses HI,..., H7 of theorem \2.1b\ 

b) For every 9 in A/(IV*) and u in Af(X), the auxiliary game F[g\(u) defined 
in definition \2.19\ has a value which corresponds to vye\{u), as defined in notation 

c) For 7r in A E , anything that can be guaranteed by player 1 with Markov 
strategies in Y\m{Tt) can be guaranteed by player 1 in the original game 

Proof : a) H1,...,H7 have been proved, b) The equality between the value func- 
tions of the original game and of the auxiliary game comes from propositions 
12.201 and 13.201 c) A markov strategy of player 1 is a sequence (cr t )t>i, where for 
each t a t is a mapping from X to A giving the action to be played on stage t de- 
pending on the current state in X = A (If). It induces a probability distribution 
on (X x A)°°, regardless of player 2's actions (see subsection I2.3.3[) . Any such 
strategy can be mimicked by player 1 in the original game, since this player can 
compute at each stage the belief of player 2 on the state in K. □ 

Notice that the analog of point c) is not true regarding player 2, because in 
the original game this player can compute posterior beliefs on K only if he knows 
player l's strategy. 

We can now conclude the proof of theorem 13.41 Fix tc in A E , and put u = 
it G Af(X). For every 9, we have v^ti) = V]^{u) by proposition 13.231 b) and 
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claim I3TT31 So v*(tt) = inf„sup m w miTl (7r) = inf n sup m t> m ,ri(w) = v*(u). By theorem 
12.161 {v n {u)) n converges to v*(u), and player 1 can guarantee v*(u) in T(u) with 
a Markov strategy. So we obtain that (v n (ir)) n converges to v*(ir), and player 1 
can guarantee this quantity in the original game r(7r). Finally r(7r) has a uniform 
value which is : 

v(tt) = v~(tt) = v + (tt) = v(tt) = v*(tt). 

□ □ □ □ 

3.3 Comments and consequences 

3.3.1 Byproducts of the proof. 

The proof of theorem 13.41 shows, under the very same hypotheses HA' and 
HB', more than the existence of the uniform value. 

In particular, the application of theorem 12.161 to the auxiliary game gives : 

v*(n) = inf n >isup m > t; miri (7r) = sup m > inf n >iv mjn (7r), 
= inf n >isup m > u> min (7r) = sup, m > inf n >iw min (7r). 

where v m , n (7i) is defined in definition I3.5[ and w min (ir) = inf geA({i,...,n})V[9 m - n ] (tt) 
(see definition 12.311 and corollary I2.38[) . 

And (v n ) n uniformly converges to v* on A E . 

Concerning ^-optimal strategies, we have seen that player 1 can guarantee 
v*(u) with Markov strategies, i.e. with strategies that play at each stage a mixed 
action determined by player 2's current belief on the current state of nature in 
K. 

Regarding player 2, one can strengthen the construction of proposition 13.221 
and show as in remark 12.411 that player 2 has 0-optimal strategies in the game 
r(7r). Notice that our proof does not tell if player 1 has 0-optimal strategies in 
the game r(7r) (see example 5.7. in Renault, 2007). □ 

3.3.2 HA' or HB' can not be withdrawn in theorem 13.41 

An example of a game satisfying HA' and having no uniform value is given in 
Sorin, 1984. It is a particular case of a stochastic game with incomplete informa- 
tion, where after each stage the players perfectly observe the actions just chosen. 
(There are two possible stochastic games of "Big Match" type, and player 1 only 
knows which one is being played.) 

An example of a game satisfying HB' and having no uniform value is given 
in Sorin and Zamir, 1985. It belongs to a class of games called repeated games 
with incomplete information on one and a half side : at each stage, both players 
will play the same matrix game. Player 1 initially receives a signal which tells 
him which matrix game will be played, but does not know the initial signal of 
player 2, so can not deduce from his signal the belief of player 2 on the selected 
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matrix game. The transition function q is particularly simple there : q(k,i,j) is 
the Dirac measure on (k,j,i). 

3.3.3 Applications. 

Consider the following model of repeated games with standard monitoring. 
There are : a finite set of states K, an initial probability p on A(K), finite action 
sets I and J, for each state k a payoff matrix (G (hj))u,j), and for each state k 
in K and action i in I there is a probability l(k, i) G A(iC). At stage 1, a state k\ 
is selected according to p and told to player 1 only. Then simultaneously player 1 
chooses %\ in I and player 2 chooses ji in J. The payoff for player 1 is G kl (ii, ji), 
and is publicly announced. At stage t > 2, k t is selected according to 

l(kt-x,it-i) and told to player 1 only. Then the players choose i t and j t . The 
stage payoff for player 1 is G kt (i t ,jt), {it, it) is publicly announced, and the play 
goes to stage t + 1. 

This model is a generalization of the model of Markov chain repeated games 
with lack of information on one side introduced in Renault, 2006. Here, player 
1 is not only informed of the sequence of actions, but also he can influence the 
state process. Studying this model has lead to the present paper, and some ideas 
developed here already come from Renault 2006. It also contains stochastic games 
with a single controller and incomplete information on the side of his opponent, as 
studied in Rosenberg et al, 2004. So the present paper generalizes both theorem 
2.3 in Renault 2006, and theorem 6 in Rosenberg et al. 2004, and as a consequence 
it also generalizes the original existence result of Aumann and Maschler (1995) 
for the value of (non stochastic) repeated games with incomplete information on 
one side and perfect monitoring. 

Notice that our result does not imply the existence of the value for models 
when player 1 receives signals without having a perfect knowledge of the belief 
of player 2 on the state (see Aumann Maschler 1995, or Zamir, 1992 for repeated 
with lack of information on one side, or Neyman 2008 for Markov chain repeated 
games with lack of information on one side). When the state is uncontrolled, more 
flexibility on the signalling structure can be allowed. 

3.3.4 Open problems. 

1. We have seen that hypotheses HA' and HB' can not be withdrawn in theo- 
rem ULU However, strengthening HB' into HB may allow to weaken HA' into the 
following hypothesis. 

Hypothesis HA" : Player 1 is more informed than player 2, i.e. there exists a 
mapping d : C — ► D such that : if E denotes {(&, c, d) G K x C x D, d(c) = d}, 
we have : n(E) = 1, and q(k,i,j)(E) = 1, \/(k,i,j) G K x I x J. 
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If we only assume that HA" and HB' hold, it may be the case that player 2 
controls player l's signal, hence in some sense player 2 may "manipulate" player 
l's knowledge of the state. So our proof does not apply here, and in our opinion 
most likely the value may fail to exist. 

The situation is different if we assume HA" and HB. Player 1 always have 
more information than player 2 about the state, but the set X = A(K) is not 
sufficient to characterize, after each stage, the difference of information from 
player 1 to player 2. The natural state space here may rather be the set {(u, v) G 
Af(X) x Af(X),u ^ v}. Does the uniform value exist in this case? 

2. In general, recall that lemma [3T61 is true without hypotheses, so in particu- 
lar the n stage values v n always exist. There is no known example of a zero-sum 
repeated game (defined with finite data exactly as in subsection l3.ll) where lim n t> n 
does not exist. 

3. Assume that player 1 always has more information than player 2, i.e. that 
player 1 can deduce from his signal both the signal and the action of player 2. 
This is the case, e.g., when HA holds. It has been conjectured by Mertens, Sorin 
and Zamir (see Sorin, 2002, 6.5.8. p. 147, or Mertens et al, 1994, Part C, p. 451) 
that for such repeated games, the limit of {v n {ji)) n exists and can be guaranteed 
by player 1 in r(7r). 

The approach used here might help to prove the conjecture. An important step 
would be to obtain an analog of the result on dynamic programming (Renault, 
2007) for two-player stochastic games with deterministic transitions and action- 
independent payoffs. More precisely, let Z be a state space, A and B be action 
sets, r be a payoff function from Z to [0, 1], and / be a transition from Z x Ax B 
to Z. At each stage, if the current state is z simultaneously player 1 chooses a 
and player 2 chooses b. Player l's payoff is r(z), and (a, b) and the new state 
f(z,a,b) are publicly announced. Z being a precompact metric space, can we 
find "nice" uniform equicontinuity conditions on some auxiliary value functions 
that will ensure the existence^] of the uniform value ? 
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